20 November 2015

Objectives for the week

  • Collaborative Research Project

  • Next Class

  • Review

  • Static maps with ggmap

  • Dynamic results presentation

    • Static website hosting with gh-pages

Collaborative Research Project (1)

Purposes: Pose an interesting research question and try to answer it using data analysis and standard academic practices. Effectively communicate your results to a variety of audiences in a variety of formats.

Deadline:

  • Presentation: In-class 4 December

  • Website/Paper: 11 December

Collaborative Research Project (2)

The project can be thought of as a 'dry run' for your thesis with multiple presentation outputs.

Presentation: 10 minutes maximum. Engagingly present your research question and key findings to a general academic audience (fellow students).

Paper: 6,000 words maximum. Standard academic paper, properly cited laying out your research question, literature review, data, methods, and findings.

Website: An engaging website designed to convey your research to a general audience.

Collaborative Research Project (3)

Project total: 50% of your final mark.

  • 10% presentation

  • 10% website

  • 30% paper

Collaborative Research Project (4)

As always, you should submit one GitHub repository with all of the materials needed to completely reproduce your data gathering, analysis, and presentation documents.

Note: Because you've had two assignments already to work on parts of the project, I expect high quality work.

Collaborative Research Project (5)

Find one other group to be a discussant for your presentation.

The discussants will provide a quick (max 2 minute) critique of your presentation–ideas for things you can improve on your paper–pose questions.

Office hours

I will have normal office hours (Friday 15:00-17:00, room 1.64) every week for the rest of the term.

Please take advantages of this opportunity to improve your final project.

Be prepared.

Review

  • What is the data-ink ratio? Why is it important for effective plotting.

  • Why should you avoid using the size of circles to have meaning about continuous variables?

  • Why not use red-green colour contrasts to indicate contrasting data?

  • How many decimal places should you report in a table and why?

ggmap

Last class we didn't have time to cover mapping with ggmap.

We've already seen how ggmap can be used to find latitude and longitude.

library(ggmap)

places <- c('Bavaria', 'Seoul', '6 Pariser Platz, Berlin')

geocode(places)
##         lon      lat
## 1  11.50000 49.00000
## 2 126.97783 37.56826
## 3  13.41377 52.52330

ggmap: get the map

qmap(location = 'Berlin', zoom = 15)

Plot Houston crime with ggmap

Example from: Kahle and Wickham (2013)

Use crime data set that comes with ggmap

names(crime)
##  [1] "time"     "date"     "hour"     "premise"  "offense"  "beat"    
##  [7] "block"    "street"   "type"     "suffix"   "number"   "month"   
## [13] "day"      "location" "address"  "lon"      "lat"

Clean data

# find a reasonable spatial extent
qmap('houston', zoom = 13) # gglocator(2) see in RStudio

Clean data

# only violent crimes
violent_crimes <- subset(crime,
    offense != "auto theft" & offense != "theft" &
    offense != "burglary")

# order violent crimes
violent_crimes$offense <- factor(violent_crimes$offense,
    levels = c("robbery", "aggravated assault", "rape", "murder"))

# restrict to downtown
violent_crimes <- subset(violent_crimes,
    -95.39681 <= lon & lon <= -95.34188 &
    29.73631 <= lat & lat <= 29.78400)

Plot crime data

# Set up base map
HoustonMap <- qmap("houston", zoom = 14,  
                   source = "stamen", maptype = "toner")

# Add points
FinalMap <- HoustonMap +
                geom_point(aes(x = lon, y = lat, colour = offense),
                data = violent_crimes) +
                xlab('') + ylab('') +
                theme(axis.ticks = element_blank(), 
                      axis.text.x = element_blank(),
                      axis.text.y = element_blank()) + 
                guides(size = guide_legend(title = 'Offense'),
                       colour = guide_legend(title = 'Offense'))

print(FinalMap)

Interactive visualisations

When your output documents are in HTML, you can create interactive visualisations.

Potentially more engaging and could let users explore data on their own.

Interactive visualisations

Big distinction:

Client Side: Plots are created on the user's (client's) computer. Often JavaScript in the browser. You simply send them static HTML/JavaScript needed for their browser to create the plots.

Server Side: Data manipulations and/or plots (e.g. with Shiny Server) are done on a server in R. Browsers don't come with R built in.

Hosting

There are lots of free services (e.g. GitHub Pages) for hosting webpages for client side plot rendering.

You usually have to use a paid service for server side data manipulation plotting.

Server Side Applications

You can use R to (relatively) easily create server side web applications with R.

To do this use Shiny.

We are not going to cover Shiny in the class as it usually requires a paid service to host.

Set up for Creating Websites with Client Side Visualisations

You already know how to create HTML documents with R Markdown.

Set your code chunk to results='asis'.

There is a growing set of tools for interactive plotting:

Caveat



These packages simply create an interface between R and (usually) JavaScript.

Debugging often requires some knowledge of JavaScript and the DOM.

In sum: usually simple, but can be mysteriously difficult without a good knowledge of JavaScript/HTML.

Google Plots with googleVis

The googleVis package can create Google plots from R.

Example modified from googleVis Vignettes.

# Create fake data
fake_compare <- data.frame(
                country = c("2010", "2011", "2012"),
                US = c(10,13,14),
                GB = c(23,12,32))

googleVis simple example

library(googleVis)
line_plot <- gvisLineChart(fake_compare)
print(line_plot, tag = 'chart')

Note: To show in interactive R use plot instead of print and don't include tag = 'chart'.

Choropleth map with googleVis

library(WDI)
co2 <- WDI(indicator = 'EN.ATM.CO2E.PC', start = 2010, end = 2010)
co2 <- co2[, c('iso2c','EN.ATM.CO2E.PC')]

# Clean
names(co2) <- c('iso2c', 'CO2 Emissions per Capita')
co2[, 2] <- round(log(co2[, 2]), digits = 2)

# Plot
co2_map <- gvisGeoChart(co2, locationvar = 'iso2c',
                      colorvar = 'CO2 Emissions per Capita',
                      options = list(
                          colors = "['#fff7bc', '#d95f0e']"
                          ))

CO2 Emissions (metric tons per capita)

print(co2_map, tag = 'chart')

More Examples

Hosting a website on GitHub Pages

Set Up GitHub Pages

First create a new branch in your repository called gh-pages:

Set Up GitHub Pages

Then sync your branch with the local version of the repository.


Finally switch to the gh-pages branch.

GitHub Pages and R Markdown

You can use R Markdown to create the index.html page.

Simply place a new .Rmd file in the repository called index.Rmd and knit it to HTML. Then sync it.

Your website will now be live.

Every time you push to the gh-pages branch, the website will be updated.

Note




Note branches in git repositories can have totally different files in them.

Example: networkD3

Seminar

Begin to create a website for your project with RMarkdown and graphics (either static or interactive).

If relevant include:

  • A table of key results

  • A googleVis map

  • A bar or line chart with googleVis or other package

  • A simulation plot created with Zelig showing key results from your regression analysis.

Push to the gh-pages branch.